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We congratulate Kang and Schafer (KS) on their 
excellent article comparing various estimators of a 
population mean in the presence of missing data, 
and thank the Editor for organizing the discussion. 
In this communication, we systematically examine 
the propensity score (PS) and the outcome regres- 
sion (OR) approaches and doubly robust (DR) esti- 
mation, which are all discussed by KS. The aim is 
to clarify and better our understanding of the three 
interrelated subjects. 

Sections 1 and 2 contain the following main points, 
respectively. 

(a) OR and PS are two approaches with different 
characteristics, and one does not necessarily domi- 
nate the other. The OR approach suffers the prob- 
lem of implicitly making extrapolation. The 
PS-weighting approach tends to yield large weights, 
explicitly indicating uncertainty in the estimate. 

(b) It seems more constructive to view DR esti- 
mation in the PS approach by incorporating an OR 
model rather than in the OR approach by incorpo- 
rating a PS model. Tan's (2006) DR estimator can 
be used to improve upon any initial PS-weighting 
estimator with both variance and bias reduction. 

Finally, Section 3 presents miscellaneous comments. 

1. UNDERSTANDING OR AND PS 

For a population, let X be a vector of (pretreat- 
ment) covariates, T be the treatment status, Y be 
the observed outcome given by (1 — T)Yq + TY\, 
where (Yq,Y\) are potential outcomes. The observed 
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data consist of independent and identically distributed 
copies (Xi,Ti,Yi), i = 1, . . . ,n. Assume that T and 
(Yo,Yi) are conditionally independent given X. The 
objective is to estimate 
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and their difference, fj,\ — fio, which gives the average 
causal effect (ACE). KS throughout focused on the 
problem of estimating fii from the data (Xj, Tj, TjYJ), 
i = 1, ... ,n, only, noting in Section 1.2 that estima- 
tion of the ACE can be separated into independent 
estimation of the means fi\ and /xq- We shall in 
Section 3 discuss subtle differences between causal 
inference and solving two separate missing-data prob- 
lems, but until then we shall restrict our attention 
to estimation of /ii from (Xj, Tj, TjYj) only. 

The model described at this stage is completely 
nonparametric. No parametric modeling assumption 
is made on either the regression function m\{X) = 
E(Y\T = 1, X) or the propensity score tt(X) = P(T = 
1\X). Robins and Rotnitzky (1995) and Hahn (1998) 
established the following fundamental result for semi- 
parametric (or more precisely, nonparametric) esti- 
mation of fX\. 

Proposition 1. Under certain regularity condi- 
tions, there exists a unique influence function, which 
hence must be the efficient influence function, given 
by 
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The semiparametric variance bound (i.e., the lowest 
asymptotic variance any regular estimator of jjL\ can 
achieve) is n~ l E 2 {ri). 

The semiparametric variance bound depends on 
both mi(X) and tt(X). The bound becomes large or 
even infinite whenever ir(X) ~ for some values of 
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X. Intuitively, it becomes difficult to infer the over- 
all mean of Y\ in this case, because very few values 
of Yi are observed among subjects with n(X) ~ 0. 
The difficulty holds whatever parametric approach, 
OR or PS, is taken for inference, although the symp- 
toms can be different. This point is central to our 
subsequent discussion. 

The problem of estimating [i\ is typically handled 
by introducing parametric modeling assumptions on 
either m\(X) or tt(X). The OR approach is to spec- 
ify an OR model, say mi(X; a), for mi(X) and then 
estimate fi\ by 

1 n 

VOR = -]Trhi(Xi), 

where rh\{X) is the fitted response. The PS ap- 
proach is to specify a PS model, say ir(X;~/), for 
7r(X) and then estimate /Ui by 



or 




where ft(X) is the fitted propensity score. The idea 
of inverse probability weighting (IPW) is to recover 
the joint distribution of (X, Y\) by attaching weight 
a7r" 1 (Xj) to each point in {(Xj,Yj) :Ti = 1} (see 
Tan, 2006, for a likelihood formulation). More gener- 
ally, consider the following class of augmented IPW 
estimators fiAiPW = fiAiPw{h) depending on a known 
function h{X): 

1 » TiY 1 » / Ti \ 

A theoretical comparison of the two approaches is 
given by 

Proposition 2. Assume that an OR model is 
correctly specified and m\(X) is efficiently estimated 
with adaptation to heteroscedastic var(Y\\X), and 
that a PS model is correctly specified and tt(X) may 
or may not be efficiently estimated. Then 

asy.var (fion)< asy.var (/Iaipw), 

where asy.var. denotes asymptotic variance as n—> 
oo. 



In fact, the asymptotic variance of flop, which 
is the lowest under the parametric OR model, is 
no greater than the semiparametric variance bound 
under the nonparametric model, whereas that of 
f^AiPW is no smaller than n _1 £ ,2 (ri) because t\ has 
the smallest variance among tt~ 1 (X)TY — (-k~ 1 (X)T — 
l)h(X) over all functions h(X). In the degenerate 
case where m\{X) and tt(X) are known, the com- 
parison can be attributed to Rao-Blackwellization 
because E^^TY - (vr" 1 (X)T - l)h(X)\X] = 
m\{X). This result has interesting implications for 
understanding the two approaches. 

First, the result formalizes the often- heard state- 
ment that the (A)IPW estimator is no more effi- 
cient than the OR estimator. If a correct OR model 
and a correct PS model were placed in two black 
boxes, respectively, and if a statistician were asked 
to open one and only one box, then the statistician 
should choose the box for the OR model in terms of 
asymptotic efficiency (minus the complication due 
to adaptation to heteroscedastic variance of Yi given 
X). However, one could immediately argue that this 
comparison is only of phantom significance, because 
all models (by human efforts) are wrong (in the pres- 
ence of high-dimensional X) and therefore the hy- 
pothetical situation never occurs. In this sense, we 
emphasize that the result does not establish any ab- 
solute superiority of the OR approach over the PS 
approach. 

Second, even though not implying one approach is 
better than the other, the result does shed light on 
different characteristics of the two approaches as an 
approximation to the ideal nonparametric estima- 
tion. Typically, increasingly complicated but nested 
parametric models can be specified in either ap- 
proach to reduce the dependency on modeling as- 
sumptions. For a sequence of OR models, the asymp- 
totic variance of jloR is increasing to the semipara- 
metric variance bound, whereas for a sequence of 
PS models, the asymptotic variance of (xaipw is de- 
creasing to the semiparametric variance bound. For 
this difference, we suggest that the OR approach 
is aggressive and the PS approach is conservative. 
Correctly specifying an OR model ensures that fioR 
is consistent and has asymptotic variance no greater, 
whereas correctly specifying a PS model ensures that 
[i'AiPW is consistent and has asymptotic variance no 
smaller, than otherwise would be best attained with- 
out any modeling assumption. This interpretation 
agrees with the finding of Tan (2006) that the OR 
approach works directly with the usual likelihood, 
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whereas the PS approach retains part of all infor- 
mation and therefore ignores other part on the joint 
distributions of covariates and potential outcomes. 

Now the real, hard questions facing a statistician 
are: 

(a) Which task is more likely to be accomplished, 
to correctly specify an OR model or a PS model? 

(b) Which mistake (even a mild one) can lead to 
worse estimates, misspecification of an OR model or 
a PS model? 

First of all, it seems that no definite comparison 
is possible, because answers to both questions de- 
pend on unmeasurable factors such as the statisti- 
cian's effort and experience for question (a) and the 
degree and direction of model misspecification for 
question (b). Nevertheless, some informal compar- 
isons are worth considering. 

Regarding question (a), a first answer might be 
"equally likely," because both models involve the 
same vector of explanatory variables X. However, 
the two tasks have different forms of difficulties. The 
OR-model building works on the "truncated" data 
{(Xi, Yi) : Ti = 1} within treated subjects. Therefore, 
any OR model relies on extrapolation to predict 
m\{X) at values of X that are different from those 
for most treated subjects [i.e., tt(X) 0]. The usual 
model checking is not capable of detecting OR-model 
misspecification, whether mild or gross, in this re- 
gion of X. (Note that finding high-leverage observa- 
tions can point to the existence of such a region of 
X , not model misspecification.) This problem holds 
for low- or high-dimensional X , and is separate from 
the difficulty to capture m\{X) within treated sub- 
jects when X is high-dimensional [cf. KS's discussion 
below display (2)]. In contrast, the PS-model build- 
ing works on the "full" data {(Xj,Tj)} and does not 
suffer the presence of data truncation, although suf- 
fering the same curse of dimensionality. The exercise 
of model checking is capable of detecting PS-model 
misspecification. The matter of concern is that suc- 
cessful implementation is difficult when X is high- 
dimensional. 

Regarding question (b), KS (Section 2.1) suggested 
that the (A)IPW estimator is sensitive to misspec- 
ification of the PS model when ir(X) ~ for some 
values of X. For example, if tt(X) = 0.01 is under- 
estimated at 0.001, then, even though the absolute 
bias is small (= 0.009), the weight tt~ 1 (X) is overes- 
timated by 10 times higher. In this case, the estima- 
tor has inflated standard error, which can be much 



greater than its bias. In contrast, if the OR model 
is misspecified, then the bias of the OR estimator 
is the average of those of rhi(X) across individual 
subjects in the original scale, and can be of similar 
magnitude to its standard deviation. 

In summary, OR and PS are two approaches with 
different characteristics. If an OR model is correctly 
specified, then the OR estimator is consistent and 
has asymptotic variance no greater than the semi- 
parametric variance bound. Because of data trunca- 
tion, any OR model suffers the problem of implicitly 
making extrapolation at values of X with n(X) ~ 0. 
Finding high-leverage observations in model check- 
ing can point to the existence of such values of X . 
In contrast, the PS approach specifically examines 
7r(X) and addresses data truncation by weighting to 
recover the joint distribution of (X, Y{). The weights 
are necessarily large for treated subjects with tt(X) ss 
0, in which case the standard error is large, explic- 
itly indicating uncertainty in the estimate. If a PS 
model is correctly specified, then the (A)IPW esti- 
mator is consistent and has asymptotic variance no 
smaller than the semiparametric variance bound. 

2. UNDERSTANDING DR 

The OR or the (A)IPW estimator requires spec- 
ification of an OR or a PS model, respectively. In 
contrast, a DR estimator uses the two models in a 
manner such that it remains consistent if either the 
OR or the PS model is correctly specified. The pro- 
totypical DR estimator of Robins, Rotnitzky and 
Zhao (1994) is 

1 n rpy 

n^[\-K(Xi) J 
1 n 

1 n T 

The two equivalent expressions [resp. (9) and (8) in 
KS] correspond to those for the efficient influence 
function t\ in Proposition 1. Proposition 3 collects 
theoretical comparisons between the three estima- 
tors. 

Proposition 3. The following statements hold: 
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(i) I^aipw fix is doubly robust. 

(ii) fiAiPWfix is locally efficient: if a PS and an 
OR model are correctly specified, then (iAiPW,fix 
achieves the semiparametric variance bound and hence 



asy.var (/} 



AlPWfix) 



< asy.var (ftAiPw) 



(iii) If an OR model is correctly specified and 
mi(X) is efficiently estimated in fion, then 



asy.var (/t 



AlPWfix) 



> asy.var (fiOR) 



Compared with fioRi ^AlPWfix is more robust in 
terms of bias if the OR model is misspecified but the 
PS model is correctly specified, but is less efficient in 
terms of variance if the OR model is correctly spec- 
ified. The usual bias- variance trade-off takes effect. 
Compared with fiAlPW-, A AIPW, fix is more robust in 
terms of bias if the PS model is misspecified but 
the OR model is correctly specified, and is more ef- 
ficient in terms of variance if both the PS and the 
OR models are correctly specified. The usual bias- 
variance trade-off seems not to exist. Intuitively, the 
difference can be attributed to the characteristics of 
OR (being aggressive) and PS (being conservative) 
discussed in Section 1. It is possible for the PS ap- 
proach to reduce both bias and variance by incorpo- 
rating an OR model, but not so for the OR approach 
by incorporating a PS model. 

Local efficiency implies that if the PS model is 
correctly specified, then fiAiPWfix gains efficiency 
over pLAWW f° r every function h(X) under the con- 
dition that the OR model is also correctly specified. 
A more desirable situation is to find an estimator 
that is not only doubly robust and locally efficient 
but also, whenever the PS model is correctly spec- 
ified, guaranteed to gain efficiency over (iaipw f° r 
any initial, fixed function h(X). For simplicity, con- 
sider faipw corresponding to h(X) = as the initial 
estimator. In this case, consider Tan's (2006) regres- 
sion (tilde) estimator 

l^REG = — // 
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1 Wi(Xi), 



where (3 W is the first element oi(3 = E 1 (^C T )-E , (C 5 ?)> 
E denotes sample average, and 
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This estimator algebraically resembles Robins, Rot- 
nitzky and Zhao's (1995) regression (hat) estimator 
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where f3 W is the first element of /3 = E 1 (H T )E (£fj). 
Compared with £laipw fix> each estimator introduces 
an estimated regression coefficient, (3 or (3, of fj against 
control variates £. Therefore, \1reg and £ireg share 
the advantage of optimally using control variates 
£ [Proposition 4(ii)]. See Section 3 for a discussion 
about "control variates" and "regression estimators." 
On the other hand, (3 is defined in the classical man- 
ner, whereas (3 is specially constructed by exploiting 
the structure of control variates £. This subtle dif- 
ference underlies Proposition 4(i). 

Proposition 4. The following statements hold: 

(i) fj-REG a- n d P-reg ore locally efficient, but JIreg 
is doubly robust and £ireg is not. 

(ii) If a PS model is correctly specified and n(X) 
is efficiently estimated, then JIreg and [ireg achieve 
the smallest asymptotic variance among 



1 y> TjYj 



1 



T 



{\t{Xi) 



1 )m l {X i ), 



where b^ is an arbitrary coefficient. The two es- 
timators are asymptotically at least as efficient as 
falPW an d fiAlPWfix, corresponding to b^ = and 
1. 

Compared with /Iaipw fix, &REG provides a more 
concrete improvement upon flipw due to the pos- 
session of three properties: optimality in using con- 
trol variates, local efficiency and double robustness. 
Using fiREG achieves variance reduction if the PS 
model is correctly specified (the effect of which is 
maximal if the OR model is also correctly speci- 
fied), and bias reduction if the PS model is misspec- 
ified but the OR model is correctly specified. On the 
other hand, comparison between fx or and fiREG is 
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similarly subject to the usual bias- variance trade-off 
as that between flop and jlAWWfix- That is, (Ireg 
is more robust than flop if the OR model is misspec- 
ified but the PS model is correctly specified, but is 
less efficient if the OR model is correctly specified. 

The preceding comparisons between fiAiPWfixi 
fipEG and fioR, filPW present useful facts for under- 
standing DR estimation. It seems more meaningful 
to consider fiAiPWfix or Jareg as an advance or im- 
provement in the PS approach by incorporating an 
OR model rather than in the OR approach by incor- 
porating a PS model. The OR and PS models play 
different roles, even though the models are equally 
referred to in the concept of DR and fiAiPWfix can 
be expressed as bias-corrected fioR or equivalently 
as bias-corrected fiipw ■ This viewpoint is also sup- 
ported by the construction of ftAiPWfix (in the first 
expression by Robins, Rotnitzky and Zhao, 1994) 
and fiREG ■ Both of the estimators are derived under 
the assumption that the PS model is correct, and 
then examined in the situation where the OR model 
is also correct, or the PS model is misspecified but 
the OR model correct (see Tan, 2006, Section 3.2). 

The different characteristics discussed in Section 1 
persist between the PS (even using fiAiPWfix or JIreg 
with the DR benefit) and OR approaches. The asymp- 
totic variance of (iaipw, fiAiPWfix> or &REG if Vreg 
the PS model is correctly specified is no smaller, 
whereas that of flop if the OR model is correctly 
specified is no greater, than the semiparametric vari- 
ance bound. Moreover, if the OR model is correct, 
the asymptotic variance of fiAiPWfix or P,reg is still 
no smaller than that of flop- Therefore: 

Proposition 5. The asymptotic variance of 
fiAiPWfix or JIreg if either a PS or an OR model 
is correctly specified is no smaller than that of fioR 
if the OR model is correctly specified and m\(X) is 
efficiently estimated in (ic-R- 

Like Proposition 2, this result does not establish 
absolute superiority of the OR approach over the 
PS-DR approach. Instead, it points to considering 
practical issues of model specification and conse- 
quences of model misspecification. There seems to 
be no definite comparison, because various, unmea- 
surable factors are involved. Nevertheless, the points 
regarding questions (a) and (b) in Section 1 remain 
relevant. 

In summary, it seems more constructive to view 
DR estimation in the PS approach by incorporat- 
ing an OR model rather than in the OR approach 



by incorporating a PS model. The estimator [ireg 
provides a concrete improvement upon fiipw with 
both variance and bias reduction in the sense that 
it gains efficiency whenever the PS model is cor- 
rectly specified (and maximally so if the OR model 
is also correctly specified), and remains consistent if 
the PS model is misspecified but the OR model is 
correctly specified. On the other hand, comparison 
between Jireg and fioR is complicated by the usual 
bias-variance trade-off. Different characteristics are 
associated with the OR and the PS-DR approaches 
and should be carefully weighed in applications. 

3. OTHER COMMENTS 
Control Variates and Regression Estimators 

The name "regression estimator" is adopted from 
the literatures of sampling survey (e.g., Cochran, 
1977, Chapter 7) and Monte Carlo integration (e.g., 
Hammersley and Handscomb, 1964), and should be 
distinguished from "regression estimation" described 
by KS (Section 2.3). Specifically, the idea is to ex- 
ploit the fact that if the PS model is correct, then fj 
asymptotically has mean \x\ (to be estimated) and 
£ mean (known). That is, £ serve as auxiliary 
variables (in the terminology of survey sampling) 
or control variates (in that of Monte Carlo integra- 
tion). Variance reduction can be achieved by using 
E(fi) — bE(^), instead of jiipw = E(fj), with b an 
estimated regression coefficient of fj against £. 

The control variates for JIreg i n Section 2 include 
(7r _1 r - l)mi and (T - 7r)[vr(l - 7t)]- 1 dn/&y, the 
second of which is the score function for the PS 
model and is necessary for asymptotic optimality 
in Proposition 4(ii). If the PS model is correct, then 
& reg is always at least as efficient as fiipw in the 
raw version, that is, fiAiPwify, but not always than 
fiipw in the ratio version. However, the indefinite- 
ness can be easily resolved. If the control variate 
%~ X T — 1 is added, or (l,mi) T substituted for mi, 
then /Ireg always gains efficiency over both versions 
of fiipw- Furthermore, if (l,/t,mi) is substituted 
for mi, then JIreg always gains efficiency also over 
the estimator fiAlPw(h)- 

Causal Inference 

Causal inference involves estimation of both \x\ 
and Similar estimators of \i§ can be separately 
defined by replacing T, tt and mi with 1 — T, 1 — tt 
and mo, where mo = E(Y\T = 0, X). The control 
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variates ((1 — 7r) -1 (l — T) — l)(l,77io) T for estimat- 
ing (j,q differ from {tt~ 1 T — l)(l,mi) T for estimating 
fi±. As a consequence, even though fli,REG or JIq^reg 
individually gains efficiency over faijpw or Ao,/pw> 
the difference JIi^reg — Po,REG does not necessarily 
gain efficiency over pijpw — fao,lPW- The problem 
can be overcome by using a combined set of con- 
trol variates, say, [tt~ 1 T — (1 — 7r)~ 1 (l — T)](tt, 1 — 
7r,7rm , (1 - 7r)mi) T . Then pi\ t REG ~ Po,REG main- 
tains optimality in using control variates in the sense 
of Proposition 4(h), in addition to local efficiency 
and double robustness. The mechanism of using a 
common set of control variates for estimating both 
111 and fiQ is automatic in the likelihood PS approach 
of Tan (2006). 

PS Stratification 

KS (Section 2.2) described the stratification es- 
timator of Rosenbaum and Rubin (1983) as a way 
"to coarsen the estimated propensity score into a 
few categories and compute weighted averages of 
the mean response across categories." It is helpful 
to rewrite the estimator in their display (6) as 



TiYi 



1 

Mstrat = — /. ' - 



where 7r strat (A) = E?=i2il{7r(X i ) G Sj}/Eti 
l{7r(Xj) G Sj} if it(X) G Sj (the jth estimated PS 
stratum), j = l,...,s. That is, /2 s trat is exactly an 
IPW estimator based on the discretized 7T s trat(A")- 
Comparison between /Wat and jXipw is subject to 
the usual bias- variance trade-off. On one hand, /Wat 
often has smaller variance than jlipw- On the other 
hand, the asymptotic limit of /Wat can be shown to 
be 

■ E[Tr(X)mi(X)\iT*(X) G 5*1 

g 'W )eq). 

where vr*(X) is the limit of ft(X), which agrees with 
the true n(X) if the PS model is correct, and 5* 

is that of Sj. The ratio inside the above sum is the 
within-stratum average of m\{X) weighted propor- 
tionally to tt(X). Therefore, /Wat is inconsistent un- 
less 7t(X) or mi (X) is constant within each stratum 
(cf. KS's discussion about crude DR in Section 2.4). 
The asymptotic bias depends on the joint behav- 
ior of m\{X) and tt(X), and can be substantial if 
m\(X) varies where tt(X) ~ varies so that m\{X) 
are weighted differentially, say, by a factor of 10 at 
two X's with tt(X) = 0.01 and 0.1. 



Simulations 

KS designed a simulation setup with an OR and a 
PS model appearing to be "nearly correct." The re- 
sponse is generated as Y = 210 + 27 'AZi + 13.7^2 + 
13.7^3 + 13.7 Z 4 + e, and the propensity score ir = 
expit(— Z\ + 0.5^2 — 0.25^3 — O.IZ4), where e and 
(^1,^2,^3,^4) are independent, standard normal. 
The covariates seen by the statistician are X\ = 
exp(Zi/2), X 2 = Z 2 /(l + exp(Zi)) + 10, X 3 = 
(Z1Z3/25 + O.6) 3 and X 4 = (Z2 + Z4 + 2O) 2 . The OR 
model is the linear model of Y against X, and the 
PS model is the logistic model of T against X. 

In the course of replicating their simulations, we 
accidentally discovered that the following models 
also appear to be "nearly correct." The covariates 
seen by the statistician are the same X±, X 2 , X3, but 
X 4 = (Z 3 + Z 4 + 20) 2 . The OR model is linear and 
the PS model is logistic as KS models. For one sim- 
ulated dataset, Figures 1 and 2 present scatterplots 
and boxplots similar to Figures 2 and 3 in KS. For 
the OR model, the regression coefficients are highly 
significant and R 2 = 0.97. The correlation between 
the fitted values of Y under the correct and the mis- 
specified OR models is 0.99, and that between the 
linear predictors under the correct and the misspeci- 
fied PS models is 0.93. Tables 1 and 2 summarize our 
simulations for KS models and for the alternative 
models described above. The raw version of ftipw 
is used. The estimators pf^G ana - &REG are defined 
as Preg and (ireg except that the score function 
for the PS model is dropped from £. For these four 
estimators, (l,mi) T is substituted for rh\. 

KS found that none of the DR estimators they 
tried improved upon the performance of the OR es- 
timator; see also Table 1. This situation is consis- 
tent with the discussion in Section 2. The theory of 
DR estimation does not claim that a DR estimator 
is guaranteed to perform better than the OR esti- 
mator when the OR and the PS models are both 
misspecified, whether mildly or grossly. Therefore, 
KS's simulations serve as an example to remind us 
of this indefinite comparison. 

On the other hand, neither is the OR estimator 
guaranteed to outperform DR estimators when the 
OR model is misspecified or even "nearly correct." 
As seen from Table 2, pLoR yields greater RMSE val- 
ues than the DR estimators, /&wls 5 Preg and Preg 
when the alternative, misspecified OR and PS mod- 



COMMENT 



7 



o 



O 
O 



O 

lo 




0.5 



1.0 



1.5 



2.0 



o 
m 

CNJ 



o 



o 

LO 



X1 




o 

LO 

CM 



o 
o 

CM 



o 
en 




o 

LO — 

CM 



o 

o 

CNJ 



o 08 



° o 



CO <fe£D 



oo 



o 

o 



T 



T 



300 350 



400 
x4 



450 500 



Fig. 1. Scatterplots of response versus covariates (alternative models). 
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Fig. 2. Boxplots of covariates and propensity scores (alternative models). 
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Table 1 

Numerical comparison of estimators of [ii (KS models) 



Method Bias % Bias RMSE MAE Bias % Bias RMSE MAE 



= zUU 






7r-model correct 






7r-model incorrect 






IrW 


U.UoU 


0.64 


12.6 


0.11 


10 


32 


52.7 


o.yy 




strat 


— 1.1 


-37 


3.20 


2.U4 


— z.y 


-93 


4.28 


Oil 

0.11 








y-model correct 






y-model incorrect 






OLS 


—0.025 


-0.99 


2.47 


1.68 


-0.56 


17 


3.33 


2.19 








y-model correct 






y-model incorrect 




-model 


AlrWfix 


— U.Uz4 


-0.96 


2.47 


1.0 < 


A O/l 

U.z4 


6.9 


3.44 


Z.UO 


correct 


WLS 


-0.025 


-1.0 


2.47 


1.68 


0.39 


13 


2.99 


1.89 




REGtiidc 


-0.025 


-1.0 


2.47 


1.69 


0.14 


5.2 


2.73 


1.76 




REGhat 


-0.52 


-20 


2.63 


1.73 


-0.52 


-19 


2.81 


1.78 






-0.024 


-0.98 


2.47 


1.68 


0.24 


8.9 


2.74 


1.79 






-0.21 


-8.4 


2.48 


1.68 


-0.086 


-3.2 


2.65 


1.74 


-model 


AIPW flx 


-0.026 


-1.0 


2.48 


1.71 


-5.1 


-44 


12.6 


3.75 


incorrect 


WLS 


-0.026 


-1.0 


2.47 


1.70 


-2.2 


-69 


3.91 


2.77 




REGtiidc 


-0.027 


-1.1 


2.47 


1.71 


-1.8 


-62 


3.47 


2.41 




REGhat 


-0.45 


-18 


2.60 


1.71 


-2.2 


-76 


3.68 


2.53 




REG£j e 


-0.026 


-1.1 


2.47 


1.69 


-2.0 


-68 


3.56 


2.47 




Reg^ 


-0.13 


-5.3 


2.48 


1.68 


-2.2 


-77 


3.68 


2.59 



= 1000 






7r-model correct 






7r-model incorrect 






IPW 


0.098 


2.0 


4.98 


3.04 


68 


9.2 


746 


14.7 




strat 


-1.1 


-86 


1.71 


1.24 


-2.9 


-214 


3.22 


2.94 








j/-model correct 






y-model incorrect 






OLS 


-0.047 


-4.0 


1.15 


0.770 


-0.85 


-56 


1.75 


1.15 








y-model correct 






y-model incorrect 




-model 


AIPWflx 


-0.046 


-4.0 


1.15 


0.766 


0.043 


2.6 


1.63 


1.11 


correct 


WLS 


-0.046 


-4.0 


1.15 


0.769 


0.12 


8.7 


1.37 


0.943 




REGtiidc 


-0.046 


-4.0 


1.15 


0.773 


0.048 


3.9 


1.23 


0.809 




REGhat 


-0.13 


-11 


1.16 


0.796 


-0.077 


-6.3 


1.23 


0.812 






-0.046 


-4.0 


1.15 


0.770 


0.092 


7.3 


1.26 


0.870 






-0.083 


-7.2 


1.15 


0.768 


0.024 


1.9 


1.24 


0.857 


-model 


AIPWflx 


-0.10 


-6.5 


1.61 


0.769 


-26 


-8.5 


308 


5.56 


incorrect 


WLS 


-0.048 


-4.1 


1.15 


0.764 


-3.0 


-203 


3.38 


3.05 




REGtiidc 


-0.046 


-4.0 


1.15 


0.764 


-1.7 


-120 


2.21 


1.73 




REGhat 


-0.045 


-3.9 


1.16 


0.786 


-1.7 


-122 


2.24 


1.75 




KJjjtj tildc 


-0.046 


-4.0 


1.15 


0.763 


-2.1 


-152 


2.48 


2.04 




Reg£> 


-0.058 


-5.0 


1.16 


0.771 


-2.2 


-158 


2.57 


2.15 



els are both used. For n = 200, the bias of ft or is 
2.5 and that of \xreg is 0.44, which differ substan- 
tially from the corresponding biases —0.56 and —1.8 
in Table 1 when KS models are used. 

The consequences of model misspecification are 
difficult to study, because the degree and direction 
of model misspecification are subtle, even elusive. 
For the dataset examined earlier, the absolute dif- 
ferences between the (highly correlated) fitted val- 



ues of Y under the correct and the alternative, mis- 
specified OR models present a more serious picture 
of model misspecification. In fact, the quartiles of 
these absolute differences are 2.0, 3.2 and 5.1, and 
the maximum is 20. 

For both Tables 1 and 2, the DR estimators Jjlreg 
and fj^jiEG P er f° rm overall better than the other 
DR estimators ftAWWfix and fawLS- Compared with 
P-WLSi f^REG has MSE reduced by 15-20% (Table 1) 



COMMENT 
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Table 2 

Numerical comparison of estimators of fii (alternative models) 





Method 


Bias 


% Bias 


RMSE 


MAE 


Bias 


% Bias 


RMSE 


MAE 


n = 200 






7r-model correct 








7r-model incorrect 






IPW 


0.080 


0.64 


12.6 


6 


11 


18 


34 


55.7 


9.61 




strat 


— 1.1 


-37 


3.20 


2. 


04 


— i.i 


-36 


3.22 


O 11 








2/-model correct 








j/-model incorrect 






OLS 


-0.025 


-0.99 


2.47 


1. 


68 


2.5 


80 


4.04 


2.73 








2/-model correct 








y- mo del incorrect 




7r-model 


AIPW fix 


-0.024 


-0.96 


2.47 


1. 


67 


0.53 


14 


3.82 


2.32 


correct 


WLS 


-0.025 


-1.0 


2.47 


1 


.68 


0.83 


28 


3.09 


1.96 




REGtiidc 


-0.025 


-1.0 


2.47 


I 


fiQ 


0.33 


13 


2.63 


1.71 




REGhat 


-0.52 


-20 


2.63 


I 


7'i 
t o 


-0.34 


-13 


2.70 


1.74 




REG^i 


-0.024 


-0.98 


2.47 


i 

i 


fix 


0.45 


17 


2.74 


1.78 




REGS 


-0.21 


-8.4 


2.48 


i 


Oo 


0.09 


3.6 


2.63 


1.74 


7r-model 


AIPW flx 


-0.024 


-0.97 


2.48 


I 


71 


-2.5 


-21 


12.2 


2.72 


incorrect 


WLS 


-0.026 


-0.10 


2.47 


1 


70 


0.33 


11 


3.11 


2.05 




REGtiidc 


-0.025 


-0.10 


2.47 


1 


71 


0.44 


16 


2.74 


1.80 




REGhat 


-0.42 


-17 


2.56 


1 


71 


-0.026 


-0.95 


2.74 


1.78 




trr* \ 

REG^ e 


-0.025 


-1.0 


2.47 


1 


uy 


0.31 


11 


2.83 


1.80 




REG^ 


-0.22 


-8.9 


2.48 


1 


/ 1 


0.035 


1.3 


2.76 


1.77 


n = 1000 






7r-model correct 








7r-model incorrect 






IPW 


0.098 


2.0 


4.98 


3 


04 


80 


8.5 


951 


16.8 




strat 


— 1.1 


-86 


1.71 


1. 


24 


— U.ao 


-72 


1.65 


1.1 1 








y-model correct 








y-model incorrect 






OLS 


-0.047 


-4.0 


1.15 





,770 


2.2 


152 


2.67 


2.21 








y-model correct 








y-model incorrect 




7r-model 


AIPW fix 


-0.046 


-4.0 


1.15 





766 


0.061 


3.3 


1.87 


1.17 


correct 


WLS 


-0.046 


-4.0 


1.15 





769 


0.22 


16 


1.39 


0.957 




REGtiidc 


-0.046 


-4.0 


1.15 





,773 


0.12 


10 


1.21 


0.818 




REGhat 


-0.13 


-11 


1.16 





796 


-0.012 


-0.97 


1.19 


0.801 




REG^ e 


-0.046 


-4.0 


1.15 





,770 


0.14 


12 


1.25 


0.849 






-0.083 


-7.2 


1.15 





,768 


0.069 


5.7 


1.22 


0.826 


7r-model 


AIPWflx 


-0.12 


-6.3 


1.83 





,780 


-31 


-6.9 


441 


2.92 


incorrect 


WLS 


-0.048 


-4.1 


1.15 





,768 


-0.55 


-38 


1.55 


1.12 




REGtiidc 


-0.044 


-3.9 


1.15 





,765 


0.61 


46 


1.46 


0.946 




REGhat 


-0.099 


-8.5 


1.16 





,787 


0.57 


43 


1.45 


0.910 




KJjjtj tildc 


-0.045 


-3.9 


1.15 





,757 


0.22 


17 


1.29 


0.847 




REG^ 


-0.16 


-14 


1.17 





,764 


0.13 


10 


1.28 


0.836 



or by 20-25% (Table 2) when the PS model is cor- 
rect but the OR model is misspecified, which agrees 
with the optimality property of JIreg in Proposi- 
tion 4(h). Even the simplified estimator fi^G £ arns 
similar efficiency, although the gain is not guaran- 
teed in theory. The non-DR estimators (ireg and 
P'REG sometimes have sizeable biases even when the 
PS model is correct. 



Summary 

One of the main points of KS is that two (mod- 
erately) misspecified models are not necessarily bet- 
ter than one. This point is valuable. But at the same 
time, neither are two misspecified models necessarily 
worse than one. Practitioners may choose to imple- 
ment either of the OR and the PS-DR approaches, 
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each with its own characteristics. It is helpful for 
statisticians to promote a common, rigorous under- 
standing of each approach and to investigate new 
ways for improvement. We welcome KS's article and 
the discussion as a step forward in this direction. 
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